Assessed Report
Semester 2, weeks 6–10
Instructions
Your group is required to submit a PDF report of maximum 6 pages by 12noon on Friday the 29th of March 2024. As this is a group-based report, no extensions are allowed. Only one person is required to submit on behalf of the entire group.
You must ensure that every member of the group has joined the group on LEARN (see group name on the desk). Otherwise, the grade will be 0 for those not registered.
For the report, you are required to address research questions RQ1–RQ5 as detailed in the Research Questions (RQs) section below. Each research question relates to a different week of teaching in the course. Furthermore, each research question is broken down into sub-tasks to help you get started in your analysis.
Report Structure
The report should be titled “Assessed Report (Group NUMBER.LETTER)”, where you must replace Group NUMBER.LETTER with your group name, for example Group 0.A.
In the author part of the PDF file, specify the exam number of each individual within the group (for example: B000001, B000002, B000003, …). The exam number starts with a letter B and can be found on your student card. If your exam number is not within the list of authors, you will receive a 0 grade for the report.
The report should have the following sections:
Introduction: contains a brief introduction to the data and questions being investigated.
For example: How many units and variables are there? Are there any impossible or missing values? Which research questions are you going to investigate? What are the types of the variables and which ones are used for the investigation?Analysis: contains the details and the write-up of your results.
This should include text, tables, and/or plots, and you should ensure you interpret everything that is presented.Discussion: provides a short summary which answers the questions being investigated with a few take-home messages. It should also include ideas for future work.
Appendix A (optional): for additional figures and tables. This does not count towards the page limit and is optional: if all your figures/tables fit in the main part of the report, you can skip this. Any figure or table included here must be referenced in the main part of the report.
Appendix B (compulsory): showing all the R code used. This does not count towards the page limit. Unlike Appendix A, Appendix B is compulsory and must be included in the report.
With the exclusion of Appendix B, the report should not have any visible R code or R code output. In other words, it should only feature text, plots, and tables.
The default tables produced by R are not appropriate for a report. You need to use a function such as kbl() or kable() from the kableExtra package to produce report-quality tables.
File Submission
Only one person is required to submit on behalf of the entire group.
To submit, go to the course LEARN page, click “Assessment”, and then click “Submit Assessed report (PDF file only)”.
Ensure the file name of the submitted file includes the group name, for example AR Group 0.A.pdf
Peer Assessment of Contribution
Make sure you rate the contribution of everyone in your group on LEARN using the WebPA tool. This will open one week prior to the report due date, and will close one week after the due date. Announcements will be sent on both occasions.
You must rate the contribution of each person in your group on a scale from 1 to 5.
If you don’t contribute any peer-adjustment marks, the other group members’ marks will have more influence.
You can find a visual illustration of how peer assessment of contribution works at this website. Please note that this is for demonstration purpose only.
Lab Attendance
The report is group-work based, and you must go to the labs to work with the members of your table group.
If you attend none of the labs during the five weeks that will lead to the creation of the assessed report, you will receive a grade of 0.
Data and Background
A researcher is interested in investigating the time that high-school pupils in the UK spend socialising via the internet.
She obtained a list of all high-school students in the country. From that list, she randomly selected 40 pupils to participate in the study.
The selected pupils (ppt: 1-40) were sent a questionnaire that asked them to classify whether they lived in a rural or urban area (location: Rural/Urban), and to give the average number of minutes per day they spent on the internet engaged in various social activities - for example, on social networking sites, using instant messenger, etc.
The pupils were asked to report the average number of minutes per day spent socialising via the internet separately for school holidays time (time_holidays) and during school term (time_term).
The researcher is interested in investigating the research questions outlined in the section Research Questions (RQs) using the data file that can be accessed at the following link: https://uoepsy.github.io/data/dapr1_ar_data_2324.csv. The variables stored in the data are also described in the table below.
| Variable | Description |
|---|---|
| ppt | Unique participant identifier |
| location | Self-reported living area: Rural / Urban |
| time_holidays | Daily minutes spent socialising online (average over a period of 7 days) during school holidays |
| time_term | Daily minutes spent socialising online (average over a period of 7 days) during school term |
Research Questions (RQs)
Throughout the entire assessed report, use a 5% significance level (i.e., \alpha = .05), and the p-value method for hypothesis tests.
RQ1. Is the mean time spent socialising online during school holidays more than 120 minutes per day? If so, is the effect of practical significance?
RQ2. Do high-school pupils living in rural areas spend more time socialising online during school holidays than those living in urban areas? If so, is the effect of practical significance?
RQ3. Does the mean time spent socialising online differ between school holidays and school term? If so, is the effect of practical significance?
RQ4. Is there an association between a pupil’s location and spending more time socialising online during the holidays than term time?
RQ5. Numerically summarise and plot the relationship between time spent socialising online during school holidays and term time. Can you suggest any other questions of interest that could be addressed in future work using this dataset?
RQ1 (week 6) sub-tasks
RQ1. Is the mean time spent socialising online during school holidays more than 120 minutes per day? If so, is the effect of practical significance?
Read the data into R and inspect it. What is the type of each variable? Are there any impossible or missing values?1
Create a plot displaying the distribution of the variable of interest and provide a table of descriptive statistics to summarise it numerically.2
Perform a one sample t-test, making sure to report the results in context of the research question.3
- Compute and interpret in context a 95% confidence interval for the population mean time spent socialising online during school holidays.4
- If you reject the null hypothesis, compute the effect size (Cohen’s D), and discuss whether the effect is also of practical significance.5
Verify whether the assumptions underlying the one sample t-test are satisfied.
In the introduction section, write a brief introduction to the data and questions being investigated. How many units and variables are there? Are there any impossible or missing values? What are the types of the variables and which ones are used for the investigation?
In the analysis section, provide a write up of your results so far, using proper rounding and making sure to report your results in context of the investigation.
In the discussion section, provide a brief summary which answers the question being investigated with a few take-home messages.
-
Create the two appendices which won’t count towards the page limit:
- Appendix A, for figures or tables that don’t fit within the 6-page limit
- Appendix B, displaying all of the R code used.6
RQ2 (week 7) sub-tasks
RQ2. Do high-school pupils living in rural areas spend more time socialising online during school holidays than those living in urban areas? If so, is the effect of practical significance?
Update your figures and descriptive statistics table to display and summarise the distribution of any additional variable(s) of interest.
Perform an independent samples t-test, making sure to report the results in context of the research questions.7
- Compute and interpret in context a 95% confidence interval for the population difference in mean time spent socialising online during school holidays between the two locations.8
- If you reject the null hypothesis, compute the effect size (Cohen’s D), and discuss whether the effect is also of practical significance.9
Verify whether the assumptions underlying the independent samples t-test are satisfied.
Update the introduction, analysis, and discussion sections to incorporate the findings from this week.
RQ3 (week 8) sub-tasks
RQ3. Does the mean time spent socialising online differ between school holidays and school term? If so, is the effect of practical significance?
Update your figures and descriptive statistics table to display and summarise the distribution of any additional variable(s) of interest.
Perform a paired samples t-test, making sure to report the results in context of the research question.10
Compute and interpret in context a 95% confidence interval for the population mean difference in time spent socialising online during school holidays and school term time.
If you reject the null hypothesis, compute the effect size (Cohen’s D), and discuss whether the effect is also of practical significance.
Verify whether the assumptions underlying the paired samples t-test are satisfied.
Update the introduction, analysis, and discussion sections to incorporate the findings from this week.
RQ4 (week 9) sub-tasks
RQ4. Is there an association between a pupil’s location and spending more time socialising online during the holidays than term time?
- Create a new variable called
holidays_morewhich takes the valueTRUEiftime_holidays > time_term, andFALSEotherwise.11
- Explore the association between
locationandholidays_moreeither via a graph or a contingency table.12
- Perform a chi-squared test of independence, making sure to report the results in context of the research question.13
- Compute the Pearson residuals and report them in context.14
- Verify whether the assumptions underlying the chi-squared test of independence are satisfied.
- Update the introduction, analysis, and discussion sections to incorporate the findings from this week.
RQ5 (week 10) sub-tasks
RQ5. Numerically summarise and plot the relationship between time spent socialising online during school holidays and term time. Can you suggest any other questions of interest that could be addressed in future work using this dataset?
Summarise the association between the two numeric variables by computing the correlation. Please note that no hypothesis test is required here as it’s asking for a descriptive summary.
Create a scatterplot visualising the relation between time spent socialising online during school holidays and school term time.
Provide an interpretation of the scatterplot and correlation in context.
Can you think of any other questions that could be investigated using this dataset? Please suggest at least one in the discussion as future work.
Update the introduction, analysis, and discussion sections to incorporate the findings from this week.
Knit your Rmd file to PDF, and submit. Ensure you follow the report instructions provided.
Footnotes
Hint: The following functions may be useful:
read_csv(),head(),str(),glimpse(),summary()↩︎Hint: the following functions may be useful:
geom_histogram(),geom_density(),geom_boxplot(),summarise(),mean(),sd(),n(),min(),max(),median(),IQR(),describe()from psych.↩︎-
Hint:
↩︎t.test(<data>$<variable>, mu = <null value>, alternative = <"two.sided", "less", or "greater">) -
Hint: The
t.test()function also returns a CI.Alternatively, use the formula
\left[\bar{x} - t^* \times \frac{s}{\sqrt{n}}, \ \bar{x} + t^* \times \frac{s}{\sqrt{n}}\right]
where:
- \pm t^* are the quantiles from a t(n-1) distribution cutting a probability of 0.025 in each tail
- \bar{x} is the sample mean
- s is the sample standard deviation
- n is the sample size
-
Hint: You could use
cohens_d()from the effectsize package or compute it manually with the formula:D = \frac{\bar{x} - \mu_0}{s}↩︎
-
Hint: To create Appendix B, ensure the code chunk below is at the end of your Rmd file:
↩︎# Appendix B: R code ```{r ref.label=knitr::all_labels(), echo=TRUE, eval=FALSE} ``` -
Hint: Step 1. Test for equality of variances using the
var.test()function.Step 2. Depending on the data format, you may need to use one of these two alternatives:
t.test(<data>$<variable1>, <data>$<variable2>, mu = <null value>, alternative = <"two.sided", "less", or "greater">, var.equal = <TRUE or FALSE>)t.test(<data>$<variable1> ~ <data>$<group>, mu = <null value>, alternative = <"two.sided", "less", or "greater">, var.equal = <TRUE or FALSE>)
Instead of using
<data>$, you can drop that if you specify the dataframe viadata = <data>↩︎ -
Hint: the
t.testfunction also returns a CI. Alternatively, you can compute it manually via:(\bar{x}_1 - \bar{x}_2) \pm t^* \times SE_{\bar{x}_1 - \bar{x}_2}
When equality of variances holds, the SE is:
SE_{\bar{x}_1 - \bar{x}_2} = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}
When equality of variances does not hold, the SE is:
SE_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}↩︎
-
Hint: the function
cohens_d()from the effectsize package may be useful.-
If you have equality of variances, you can also compute it by hand as:
D = \frac{(\bar{x}_1 - \bar{x}_2) - \delta_0}{s_{p}}
where s_p is the pooled standard deviation and \delta_0 is the hypothesised population difference in means in the null hypothesis.
If you don’t have equality of variances, use
cohens_d()from the effectsize package to compute D.
-
-
Hint: Depending on the data format, one of the following may be appropriate:
t.test(<data>$<variable1>, <data>$<variable2>, paired = TRUE, ...)t.test(<data>$<variable1> ~ <data>$<group>, paired = TRUE, ...)t.test(<data>$<variable1> - <data>$<variable2>, ...)
-
Hint: the functions
table()orxtabs()may be useful for creating a contingency table.The functions
geom_mosaic()from ggmosaic,plot(table()), ormosaicplot()may be useful for a plot.↩︎ Hint: the function
chisq.test()may be useful.↩︎Hint: If you store the result of the chi-squared test above into an object, you can extract the Pearson residuals from the object using
$indexing.↩︎